[Chapter Eight][Previous]
[Next] [Art of
Assembly][Randall Hyde]
Art of Assembly: Chapter Eight
- 8.9 - The END Directive
- 8.10 - Variables
- 8.11 - Label Types
- 8.11.1 - How to Give a Symbol a Particular
Type
- 8.11.2 - Label Values
- 8.11.3 - Type Conflicts
- 8.12 - Address Expressions
- 8.12.1 - Symbol Types and Addressing Modes
- 8.12.2 - Arithmetic and Logical Operators
- 8.12.3 - Coercion
8.9 The END Directive
The end
directive terminates an assembly language source
file. In addition to telling MASM that it has reached the end of an assembly
language source file, the end
directive's optional operand
tells MS-DOS where to transfer control when the program begins execution;
that is, you specify the name of the main procedure as an operand to the
end
directive. If the end
directive's operand
is not present, MS-DOS will begin execution starting at the first byte in
the .exe file. Since it is often inconvenient to guarantee that your main
program begins with the first byte of object code in the .exe file, most
programs specify a starting location as the operand to the end
directive. If you are using the SHELL.ASM file as a skeleton for your assembly
language programs, you will notice that the end
directive already
specifies the procedure main
as the starting point for the
program.
If you are using separate assembly and you're linking together several different
object code files (see "Managing Large
Programs" on page 425), only one module can have a main program.
Likewise, only one module should specify the starting location of the program.
If you specify more than one starting location, you will confuse the linker
and it will generate an error.
8.10 Variables
Global variable declarations use the byte/sbyte/db, word/sword/dw,
dword/sdword/dd, qword/dq,
and tbyte/dt
pseudo-opcodes.
Although you can place your variables in any segment (including the code
segment), most beginning assembly language programmers place all their global
variables in a single data segment..
A typical variable declaration takes the form:
varname byte initial_value
Varname
is the name of the variable you're declaring and initial_value
is the initial value you want that variable to have when the program
begins execution. "?" is a special initial value. It means that
you don't want to give a variable an initial value. When DOS loads a program
containing such a variable into memory, it does not initialize this variable
to any particular value.
The declaration above reserves storage for a single byte. This could be
changed to any other variable type by simply changing the byte
mnemonic
to some other appropriate pseudo-opcode.
For the most part, this text will assume that you declare all variables
in a data segment, that is, a segment that the 80x86's ds
register
will point at. In particular, most of the programs herein will place all
variables in the DSEG
segment (CSEG
is for code,
DSEG
is for data, and SSEG
is for the stack).
See the SHELL.ASM program in Chaper Four for more details on these segments.
Since Chapter Five covers the declaration of variables, data types, structures,
arrays, and pointers in depth, this chapter will not waste any more time
discussing this subject. Refer to Chapter Five for more details.
8.11 Label Types
One unusual feature of Intel syntax assemblers (like MASM) is that they
are strongly typed. A strongly typed assembler associates a certain type
with symbols declared appearing in the source file and will generate a warning
or an error message if you attempt to use that symbol in a context that
doesn't allow its particular type. Although unusual in an assembler, most
high level languages apply certain typing rules to symbols declared in the
source file. Pascal, of course, is famous for being a strongly typed language.
You cannot, in Pascal, assign a string to a numeric variable or attempt
to assign an integer value to a procedure label. Intel, in designing the
syntax for 8086 assembly language, decided that all the reasons for using
a strongly typed language apply to assembly language as well as Pascal.
Therefore, standard Intel syntax 80x86 assemblers, like MASM, impose certain
type restrictions on the use of symbols within your assembly language programs.
8.11.1 How to Give a Symbol a Particular Type
Symbols, in an 80x86 assembly language program, may be one of eight
different primitive types: byte, word, dword, qword, tbyte, near, far, and
abs (constant). Anytime you define a label with the byte, word, dword,
qword,
or tbyte
pseudo-opcodes, MASM associates the
type of that pseudo-opcode with the label. For example, the following variable
declaration will create a symbol of type byte:
BVar byte ?
Likewise, the following defines a dword symbol:
DWVar dword ?
Variable types are not limited to the primitive types built into MASM. If
you create your own types using the typedef
or struct
directives MASM will associate those types with any associated variable
declarations.
You can define near symbols (also known as statement labels) in a couple
of different ways. First, all procedure symbols declared with the proc
directive (with either a blank operand field or near
in
the operand field) are near symbols. Statement labels are also near symbols.
A statement label takes the following form:
label: instr
Instr
represents an 80x86 instruction. Note that a colon must
follow the symbol. It is not part of the symbol, the colon informs the assembler
that this symbol is a statement label and should be treated as a near
typed symbol.
Statement labels are often the targets of jump and loop instructions. For
example, consider the following code sequence:
mov cx, 25
Loop1: mov ax, cx
call PrintInteger
loop Loop1
The loop
instruction decrements the cx
register
and transfers control to the instruction labelled by Loop1
until cx
becomes zero.
Inside a procedure, statement labels are local. That is, the scope of statement
labels inside a procedure are visible only to code inside that procedure.
If you want to make a symbol global to a procedure, place two colons after
the symbol name. In the example above, if you needed to refer to Loop1
outside of the enclosing procedure, you would use the code:
mov cx, 25
Loop1:: mov ax, cx
call PrintInteger
loop Loop1
Generally, far symbols are the targets of jump and call instructions. The
most common method programmers use to create a far label is to place far
in the operand field of a proc
directive. Symbols that are
simply constants are normally defined with the equ
directive.
You can also declare symbols with different types using the equ
and
extrn/extern/externdef
directives. An explanation of the extrn
directives appears in the section "Managing
Large Programs" on page 425.
If you declare a numeric constant using an equate, MASM assigns the type
abs (absolute, or constant) to the system. Text and string equates are given
the type text. You can also assign an arbitrary type to a symbol using the
equ directive, see "Type Operators"
on page 392 for more details.
8.11.2 Label Values
Whenever you define a label using a directive or pseudo-opcode, MASM
gives it a type and a value. The value MASM gives the label is usually the
current location counter value. If you define the symbol with an equate
the equate's operand usually specifies the symbol's value. When encountering
the label in an operand field, as with the loop
instruction
above, MASM substitutes the label's value for the label.
8.11.3 Type Conflicts
Since the 80x86 supports strongly typed symbols, the next question to
ask is "What are they used for?" In a nutshell, strongly typed
symbols can help verify proper operation of your assembly language programs.
Consider the following code sections:
DSEG segment public 'DATA'
.
.
.
I byte ?
.
.
.
DSEG ends
CSEG segment public 'CODE'
.
.
.
mov ax, I
.
.
.
CSEG ends
end
The mov
instruction in this example is attempting to load the
ax
register (16 bits) from a byte sized variable. Now the 80x86
microprocessor is perfectly capable of this operation. It would load the
al
register from the memory location associated with I
and load the ah
register from the next successive memory location
(which is probably the L.O. byte of some other variable). However, this
probably wasn't the original intent. The person who wrote this code probably
forgot that I
is a byte sized variable and assumed that it
was a word variable - which is definitely an error in the logic of the program.
MASM would never allow an instruction like the one above to be assembled
without generating a diagnostic message. This can help you find errors in
your programs, particularly difficult-to-find errors. On occasion, advanced
assembly language programmers may want to execute a statement like the one
above. MASM provides certain coercion operators that bypass MASM's safety
mechanisms and allow illegal operations (see "Coercion"
on page 390).
8.12 Address Expressions
An address expression is an algebraic expression that produces a numeric
result that MASM merges into the displacement field of an instruction. An
integer constant is probably the simplest example of an address expression.
The assembler simply substitutes the value of the numeric constant for the
specified operand. For example, the following instruction fills the immediate
data fields of the mov
instruction with zeros:
mov ax, 0
Another simple form of an addressing mode is a symbol. Upon encountering
a symbol, MASM substitutes the value of that symbol. For example, the following
two statements emit the same object code as the instruction above:
Value equ 0
mov ax, Value
An address expression, however, can be much more complex than this. You
can use various arithmetic and logical operators to modify the basic value
of some symbols or constants.
Keep in mind that MASM computes address expressions during assembly, not
at run time. For example, the following instruction does not load ax
from location Var
and add one to it:
mov ax, Var1+1
Instead, this instruction loads the al
register with the byte
stored at the address of Var1
plus one and then loads the ah
register with the byte stored at the address of Var1
plus two.
Beginning assembly language programmers often confuse computations done
at assembly time with those done at run time. Take extra care to remember
that MASM computes all address expressions at assembly time!
8.12.1 Symbol Types and Addressing Modes
Consider the following instruction:
jmp Location
Depending on how the label Location
is defined, this jmp
instruction will perform one of several different operations. If
you'll look back at the chapter on the 80x86 instruction set, you'll notice
that the jmp
instruction takes several forms. As a recap, they
are
jmp label (short)
jmp label (near)
jmp label (far)
jmp reg (indirect near, through register)
jmp mem/reg (indirect near, through memory)
jmp mem/reg (indirect far, thorugh memory)
Notice that MASM uses the same mnemonic (jmp
) for each of these
instructions; how does it tell them apart? The secret lies with the operand.
If the operand is a statement label within the current segment, the assembler
selects one of the first two forms depending on the distance to the target
instruction. If the operand is a statement label within a different segment,
then the assembler selects jmp
(far) label. If the operand
following the jmp
instruction is a register, then MASM uses
the indirect near jmp
and the program jumps to the address
in the register. If a memory location is selected, the assembler uses one
of the following jumps:
- NEAR if the variable was declared with
word/sword/dw
- FAR if the variable was declared with
dword/sdword/dd
An error results if you've used byte/sbyte/db
, qword/dq
,
or tbyte/dt
or some other type.
If you've specified an indirect address, e.g., jmp [bx]
, the
assembler will generate an error because it cannot determine if bx
is pointing at a word or a dword variable. For details on how you specify
the size, see the section on coercion in this chapter.
8.12.2 Arithmetic and Logical Operators
MASM recognizes several arithmetic and logical operators. The following
tables provide a list of such operators:
Arithmetic Operators Operator | Syntax | Description |
---|
+ | +expr | Positive (unary) |
- | -expr | Negation (unary) |
+ | expr + expr | Addition |
- | expr - expr | Subtraction |
* | expr * expr | Multiplication |
/ | expr / expr | Division |
MOD | expr MOD expr | Modulo (remainder) |
[ ] | expr [ expr
] | Addition (index operator) |
Logical
Operators Operator | Syntax | Description |
---|
SHR | expr
SHR expr | Shift right |
SHL | expr SHL expr | Shift left |
NOT | NOT
expr | Logical (bit by bit) NOT |
AND | expr AND expr | Logical
AND |
OR | expr OR expr | Logical OR |
XOR | expr XOR expr | Logical
XOR |
Relational Operators Operator | Syntax | Description |
---|
EQ | expr EQ expr | True (0FFh) if equal, false (0) otherwise |
NE | expr NE expr | True (0FFh) if not equal, false (0) otherwise |
LT | expr LT expr | True (0FFh) if less, false (0) otherwise |
LE | expr
LE expr | True (0FFh) if less or equal, false (0) otherwise |
GT | expr
GT expr | True (0FFh) if greater, false (0) otherwise |
GE | expr
GE expr | True (0FFh) if greater or equal, false (0) otherwise |
You must not confuse these operators with 80x86 instructions! The addition
operator adds two values together, their sum becomes an operand to an instruction.
This addition is performed when assembling the program, not at run time.
If you need to perform an addition at execution time, use the add
or adc
instructions.
You're probably wondering "What are these operators used for?"
The truth is, not much. The addition operator gets used quite a bit, the
subtraction somewhat, the comparisons once in a while, and the rest even
less. Since addition and subtraction are the only operators beginning assembly
language programmers regularly employ, this discussion considers only those
two operators and brings up the others as required throughout this text.
The addition operator takes two forms: expr+expr or expr[expr]. For example,
the following instruction loads the accumulator, not from memory location
COUNT
, but from the very next location in memory:
mov al, COUNT+1
The assembler, upon encountering this statement, will compute the sum of
COUNT
's address plus one. The resulting value is the memory
address for this instruction. As you may recall, the mov al, memory
instruction is three bytes long and takes the form:
Opcode | L. O. Displacement Byte | H. O. Displacement Byte
The two displacement bytes of this instruction contain the sum COUNT+1
.
The expr[expr]
form of the addition operation is for accessing
elements of arrays. If AryData
is a symbol that represents
the address of the first element of an array, AryData[5]
represents
the address of the fifth byte into AryData
. The expression
AryData+5
produces the same result, and either could be used
interchangeably, however, for arrays the expr[expr] form is a little more
self documenting. One trap to avoid: expr1[expr2][expr3]
does
not automatically index (properly) into a two dimensional array for you.
This simply computes the sum expr1+expr2+expr3
.
The subtraction operator works just like the addition operator, except it
computes the difference rather than the sum. This operator will become very
important when we deal with local variables in Chapter 11.
Take care when using multiple symbols in an address expression. MASM restricts
the operations you can perform on symbols to addition and subtraction and
only allows the following forms:
Expression: Resulting type:
reloc + const Reloc, at address specified.
reloc - const Reloc, at address specified.
reloc - reloc Constant whose value is the number of bytes between
the first and second operands. Both variables must
physically appear in the same segment in the
current source file.
Reloc stands for relocatable symbol or expression. This can be a variable
name, a statement label, a procedure name, or any other symbol associated
with a memory location in the program. It could also be an expression that
produces a relocatable result. MASM does not allow any operations other
than addition and subtraction on expressions whose resulting type is relocatable.
You cannot, for example, compute the product of two relocatable symbols.
The first two forms above are very common in assembly language programs.
Such an address expression will often consist of a single relocatable symbol
and a single constant (e.g., "var + 1
"). You won't
use the third form very often, but it is very useful once in a while. You
can use this form of an address expression to compute the distance, in bytes,
between two points in your program. The procsize
symbol in
the following code, for example, computes the size of Proc1
:
Proc1 proc near
push ax
push bx
push cx
mov cx, 10
lea bx, SomeArray
mov ax, 0
ClrArray: mov [bx], ax
add bx, 2
loop ClrArray
pop cx
pop bx
pop ax
ret
Proc1 endp
procsize = $ - Proc1
"$" is a special symbol MASM uses to denote the current offset
within the segment (i.e., the location counter). It is a relocatable symbol,
as is Proc1
, so the equate above computes the difference between
the offset at the start of Proc1
and the end of Proc1
.
This is the length of the Proc1
procedure, in bytes.
The operands to the operators other than addition and subtraction must be
constants or an expression yielding a constant (e.g., "$-Proc1
"
above produces a constant value). You'll mainly use these operators in macros
and with the conditional assembly directives.
8.12.3 Coercion
Consider the following program segment:
DSEG segment public 'DATA'
I byte ?
J byte ?
DSEG ends
CSEG segment
.
.
.
mov al, I
mov ah, J
.
.
.
CSEG ends
Since I and J are adjacent, there is no need to use two mov
instructions
to load al
and ah
, a simple mov ax, I
instruction would do the same thing. Unfortunately, the assembler will balk
at mov ax, I
since I is a byte. The assembler will complain
if you attempt to treat it as a word. As you can see, however, there are
times when you'd probably like to treat a byte variable as a word (or treat
a word as a byte or double word, or treat a double word as a something else).
Temporarily changing the type of a label for some particular occurrence
is coercion. Expressions can be coerced to a different type using the MASM
ptr
operator. You use the ptr
operator as follows:
type PTR expression
Type is any of byte, word, dword, tbyte, near
, far
,
or other type and expression is any general expression that is the address
of some object. The coercion operator returns an expression with the same
value as expression, but with the type specified by type. To handle the
above problem you'd use the assembly language instruction:
mov ax, word ptr I
This instructs the assembler to emit the code that will load the ax
register with the word at address I. This will, of course, load al
with I and ah
with J.
Code that uses double word values often makes extensive use of the coercion
operator. Since lds
and les
are the only 32-bit
instructions on pre-80386 processors, you cannot (without coercion) store
an integer value into a 32-bit variable using the mov
instruction
on those earlier CPUs. If you've declared DBL
using the dword
pseudo-opcode, then an instruction of the form mov DBL,ax
will generate an error because it's attempting to move a 16 bit quantity
into a 32 bit variable. Storing values into a double word variable requires
the use of the ptr
operator. The following code demonstrates
how to store the ds
and bx
registers into the
double word variable DBL
:
mov word ptr DBL, bx
mov word ptr DBL+2, ds
You will use this technique often as various UCR Standard Library and MS-DOS
calls return a double word value in a pair of registers.
Warning: If you coerce a jmp
instruction to perform a far
jump to a near
label, other than performance degradation
(the far jmp
takes longer to execute), your program will work
fine. If you coerce a call
to perform a far call to a near
subroutine, you're headed for trouble. Remember, far calls push the cs
register onto the stack (with the return address). When executing
a near ret
instruction, the old cs
value will
not be popped off the stack, leaving junk on the stack. The very next pop
or ret
instruction will not operate properly since it
will pop the cs
value off the stack rather than the original
value pushed onto the stack.
Expression coercion can come in handy at times. Other times it is essential.
However, you shouldn't get carried away with coercion since data type checking
is a powerful debugging tool built in to MASM. By using coercion, you override
this protection provided by the assembler. Therefore, always take care when
overriding symbol types with the ptr
operator.
One place where you'll need coercion is with the mov memory, immediate
instruction. Consider the following instruction:
mov [bx], 5
Unfortunately, the assembler has no way of telling whether bx
points
at a byte, word, or double word item in memory. The value of the immediate
operand isn't of any use. Even though five is a byte quantity, this instruction
might be storing the value 0005h into a word variable, or 00000005 into
a double word variable. If you attempt to assemble this statement, the assembler
will generate an error to the effect that you must specify the size of the
memory operand. You can easily accomplish this using the byte ptr
,
word ptr
, and dword ptr
operators as follows:
mov byte ptr [bx], 5 ;For a byte variable
mov word ptr [bx], 5 ;For a word variable
mov dword ptr [bx], 5 ;For a dword variable
Lazy programmers might complain that typing strings like "word
ptr
" or "far ptr
" is too much work. Wouldn't
it have been nice had Intel chosen a single character symbol rather than
these long phrases? Well, quit complaining and remember the textequ
directive. With the equate directive you can substitute a long string like
"word ptr
" for a short symbol. You'll find equates
like the following in many programs, including several in this text:
byp textequ <byte ptr> ;Remember, "bp" is a reserved symbol!
wp textequ <word ptr>
dp textequ <dword ptr>
np textequ <near ptr>
fp textequ <far ptr>
With equates like the above, you can use statements like the following:
mov byp [bx], 5
mov ax, wp I
mov wp DBL, bx
mov wp DBL+2, ds
- 8.9 - The END Directive
- 8.10 - Variables
- 8.11 - Label Types
- 8.11.1 - How to Give a Symbol a Particular
Type
- 8.11.2 - Label Values
- 8.11.3 - Type Conflicts
- 8.12 - Address Expressions
- 8.12.1 - Symbol Types and Addressing Modes
- 8.12.2 - Arithmetic and Logical Operators
- 8.12.3 - Coercion
Art of Assembly: Chapter Eight - 26 SEP 1996
[Chapter Eight][Previous]
[Next] [Art of
Assembly][Randall Hyde]